Introduction:

The data included in this analysis was assembled by earth scientist Zander Venter. Venter aquired all data through publicly available remote sensing datasets provided by the Google Earth Engine. Averages were calculated for each country “at a reduction scale of about 10 km”, and include measurements such as temperature, rainfall, elevation, and canopy cover.

Principal components analysis:

# Read in data:

world_env_vars <- read_csv(here("data", "world_env_vars.csv")) %>% 
  clean_names() 
# Create PCA dataset:

world_env_pca <- world_env_vars %>% 
  select_if(is.numeric) %>% # select only numeric values
  select(-c(ends_with("_quart"))) %>% # remove any column ending in '_quart'
  select(-c(ends_with("_month"))) %>% # remove any column ending in '_month'
  select(-c("accessibility_to_cities", "aspect")) %>% # omit additional 2 columns
  drop_na() %>% #drop NAs
  scale() %>% # scale data with different orders of magnitude
  prcomp()

world_env_pca$rotation
##                              PC1         PC2          PC3          PC4
## elevation           0.1310711077  0.04583268 -0.659947716  0.004699318
## slope               0.0003828036  0.23567957 -0.557324797  0.269104139
## cropland_cover      0.1465631407  0.25761056  0.300660652 -0.411449208
## tree_canopy_cover  -0.3503012176  0.21686452 -0.097420615 -0.114166226
## isothermality      -0.3746179043 -0.24622326 -0.091582555  0.029076552
## rain_mean_annual   -0.4008209565  0.13952922 -0.085473543 -0.059303682
## rain_seasonailty    0.0621198382 -0.47092794 -0.115192828 -0.004082272
## temp_annual_range   0.4129875297  0.03262162 -0.067977001 -0.188143053
## temp_diurnal_range  0.1899487615 -0.44263819 -0.178920279 -0.200944831
## temp_mean_annual   -0.2662247385 -0.41834393  0.115315504 -0.047042528
## temp_seasonality    0.3919145761  0.18740130 -0.006121805 -0.112541168
## wind                0.1466166122  0.05606025  0.273832445  0.795355755
## cloudiness         -0.2848952864  0.34180006 -0.007437192 -0.132708475
##                            PC5         PC6         PC7         PC8          PC9
## elevation           0.17304988 -0.39497332 -0.11825278  0.10027078 -0.458545918
## slope               0.31658777  0.46089248  0.18127610 -0.02301636  0.381741531
## cropland_cover      0.68018096  0.06402613 -0.01314082  0.41315499 -0.056064033
## tree_canopy_cover  -0.37325679  0.22839624  0.03645149  0.46521740 -0.447028020
## isothermality       0.07435174 -0.28873837  0.13563883  0.15836826  0.175963122
## rain_mean_annual   -0.13469390  0.21697761 -0.09541207  0.33095187  0.257088331
## rain_seasonailty    0.12036062  0.35836014 -0.74072109  0.06431418 -0.078076512
## temp_annual_range  -0.32085131  0.07245109 -0.06380262  0.16957173  0.203646619
## temp_diurnal_range -0.08563892 -0.25117385  0.19691623  0.46714546  0.368130439
## temp_mean_annual    0.09075995  0.23150309  0.11578585 -0.04035345 -0.052718639
## temp_seasonality   -0.32975952  0.17388444 -0.06745553  0.03839569  0.071125581
## wind                0.03304784 -0.13113962 -0.13688404  0.45267744  0.006109216
## cloudiness         -0.03676129 -0.38350274 -0.54400749 -0.09244911  0.391166274
##                           PC10        PC11        PC12         PC13
## elevation          -0.34990735 -0.04542848  0.01946759  0.006023194
## slope               0.17678677 -0.19239681 -0.03625869 -0.008670464
## cropland_cover     -0.01100688 -0.05803955  0.09060058 -0.003489179
## tree_canopy_cover   0.36781088 -0.23020098 -0.10054564 -0.016489872
## isothermality       0.18393542 -0.07976703  0.76316733 -0.061999094
## rain_mean_annual   -0.55641773  0.49870008  0.01276623  0.025245322
## rain_seasonailty    0.19616973  0.10925026  0.09110318  0.023565231
## temp_annual_range  -0.13372765 -0.17532063  0.15470482 -0.730282444
## temp_diurnal_range  0.15493793  0.01601380 -0.38228732  0.249500666
## temp_mean_annual   -0.49520055 -0.63226837 -0.11818559 -0.001518365
## temp_seasonality   -0.17402912 -0.22823763  0.41121018  0.630117369
## wind               -0.09618194 -0.12631299 -0.02211008 -0.010481027
## cloudiness          0.04837631 -0.37249703 -0.18834439  0.042951517
world_env_complete <- world_env_vars %>% 
  drop_na()

Generate PCA biplot:

# Generate PCA plot:
# Assumptions: linear relationships between variables, continuous measured variables, suffifient sample size

biplot <- autoplot(world_env_pca,
         data = world_env_complete,
         colour = "country",
         loadings = TRUE, 
         loadings.colour = "gray60", # changes colors of the arrows
         loadings.label = TRUE, # displays variables
         loadings.label.colour = "black", # change font color
         loadings.label.vjust = 1) +
  #geom_text(aes(label = country), col = "gray50", size = 2) + # use this to add country names
  theme_minimal() +
  theme(legend.position = "none") # remove legend

ggplotly(biplot)

Figure 1: This PCA biplot was created using observations of 188 countries that were not missing data for included components. The loading variables for the first two principal components are shown with gray arrows and labeled accordingly. The location of each country shows their overall location in multivariate space. The length of a loading variable line indicates variance; the shorter the line, the smaller the variance in that principal component’s direction. The angle between each line indicates correlation between loading variables. Plot is interactive, individual country names can be viewed by hovering over specific points. (Data courtesy of Zander Venter).

Summary: